13 research outputs found
FaceAtt: Enhancing Image Captioning with Facial Attributes for Portrait Images
Automated image caption generation is a critical area of research that
enhances accessibility and understanding of visual content for diverse
audiences. In this study, we propose the FaceAtt model, a novel approach to
attribute-focused image captioning that emphasizes the accurate depiction of
facial attributes within images. FaceAtt automatically detects and describes a
wide range of attributes, including emotions, expressions, pointed noses, fair
skin tones, hair textures, attractiveness, and approximate age ranges.
Leveraging deep learning techniques, we explore the impact of different image
feature extraction methods on caption quality and evaluate our model's
performance using metrics such as BLEU and METEOR. Our FaceAtt model leverages
annotated attributes of portraits as supplementary prior knowledge for our
portrait images before captioning. This innovative addition yields a subtle yet
discernible enhancement in the resulting scores, exemplifying the potency of
incorporating additional attribute vectors during training. Furthermore, our
research contributes to the broader discourse on ethical considerations in
automated captioning. This study sets the stage for future research in refining
attribute-focused captioning techniques, with a focus on enhancing linguistic
coherence, addressing biases, and accommodating diverse user needs
BdSpell: A YOLO-based Real-time Finger Spelling System for Bangla Sign Language
In the domain of Bangla Sign Language (BdSL) interpretation, prior approaches
often imposed a burden on users, requiring them to spell words without hidden
characters, which were subsequently corrected using Bangla grammar rules due to
the missing classes in BdSL36 dataset. However, this method posed a challenge
in accurately guessing the incorrect spelling of words. To address this
limitation, we propose a novel real-time finger spelling system based on the
YOLOv5 architecture. Our system employs specified rules and numerical classes
as triggers to efficiently generate hidden and compound characters, eliminating
the necessity for additional classes and significantly enhancing user
convenience. Notably, our approach achieves character spelling in an impressive
1.32 seconds with a remarkable accuracy rate of 98\%. Furthermore, our YOLOv5
model, trained on 9147 images, demonstrates an exceptional mean Average
Precision (mAP) of 96.4\%. These advancements represent a substantial
progression in augmenting BdSL interpretation, promising increased inclusivity
and accessibility for the linguistic minority. This innovative framework,
characterized by compatibility with existing YOLO versions, stands as a
transformative milestone in enhancing communication modalities and linguistic
equity within the Bangla Sign Language community
Flow field analysis of a pentagonal-shaped bridge deck by unsteady RANS
Long-span cable-stayed bridges are susceptible to dynamic wind effects due to their inherent flexibility. The fluid flow around the bridge deck should be well understood for the efficient design of an aerodynamically stable long-span bridge system. In this work, the aerodynamic features of a pentagonal-shaped bridge deck are explored numerically. The analytical results are compared with past experimental work to assess the capability of two-dimensional unsteady RANS simulation for predicting the aerodynamic features of this type of deck. The influence of the bottom plate slope on aerodynamic response and flow features was investigated. By varying the Reynolds number (2 × 104 to 20 × 104) the aerodynamic behavior at high wind speeds is clarified
Multi-class sentiment classification on Bengali social media comments using machine learning
Multi-class Sentiment Analysis (SA) is an important field of computational linguistics that extracts multiple opinions expressed in a text using NLP and text-mining techniques. Existing research on multi-class SA in the Bengali language is directed towards ternary classification with unsatisfactory classification performance. Moreover, obtaining a higher performance score is challenging due to the peculiarities of Bengali text, lack of ground truth datasets, and low resources of preprocessing tools. Moreover, no research has shown that deep learning algorithms perform higher on four types of sentiments. Therefore, we proposed a supervised deep learning classifier based on CNN and LSTM to conduct multi-class SA on Bengali social media comments labelled as sexual, religious, political, and acceptable. The study aims to achieve maximum accuracy using the proposed model and provide a comparative analysis with the baseline models. Six machine learning models with two different feature extraction techniques were considered baseline models. The performance of our proposed CLSTM architecture can greatly improve the performance of SA with 85.8% accuracy and 0.86 F1 scores on a labelled dataset of 42,036 Facebook comments. A web application based on the proposed model and the highest-performing baseline model was built to detect the real-life sentiment of social media comments
A Comparative Analysis on Suicidal Ideation Detection Using NLP, Machine, and Deep Learning
Social networks are essential resources to obtain information about people’s opinions and feelings towards various issues as they share their views with their friends and family. Suicidal ideation detection via online social network analysis has emerged as an essential research topic with significant difficulties in the fields of NLP and psychology in recent years. With the proper exploitation of the information in social media, the complicated early symptoms of suicidal ideations can be discovered and hence, it can save many lives. This study offers a comparative analysis of multiple machine learning and deep learning models to identify suicidal thoughts from the social media platform Twitter. The principal purpose of our research is to achieve better model performance than prior research works to recognize early indications with high accuracy and avoid suicide attempts. We applied text pre-processing and feature extraction approaches such as CountVectorizer and word embedding, and trained several machine learning and deep learning models for such a goal. Experiments were conducted on a dataset of 49,178 instances retrieved from live tweets by 18 suicidal and non-suicidal keywords using Python Tweepy API. Our experimental findings reveal that the RF model can achieve the highest classification score among machine learning algorithms, with an accuracy of 93% and an F1 score of 0.92. However, training the deep learning classifiers with word embedding increases the performance of ML models, where the BiLSTM model reaches an accuracy of 93.6% and a 0.93 F1 score
Theoretical Investigation on the Impact of Two HDR Dampers on First Modal Damping Ratio of Stay Cable
Stay cables are one of the vital components of a cable-stayed bridge. Due to their flexible nature, stay cables are vulnerable to external excitation and often vibrate with large amplitude under wind action which leads to the fatigue failure of the cables. To suppress such kind of large amplitude vibration by improving the damping ratio of the cable various dampers such as magnetorheological damper, friction damper; oil damper; or high damping rubber (HDR) damper are utilized and gained popularity over time. This paper focuses on improving the damping ratio of stay cables using a combination of two HDR dampers. First, the theoretical model is formulated considering cable bending stiffness to evaluate the damping effect of cable-HDR dampers system. Then, the impact of various design parameters of HDR dampers on cable damping considering the cable stiffness is performed. The comparative analysis of results shows that the considered parameters such as loss factor, spring factor, and installation location of dampers have much effect on the stay cables damping ratio. Finally, the optimal parameters of the two HDR dampers are proposed for damper design
Accessible Data Representation with Natural Sound
Sonification translates data into non-speech audio. Such auditory representations can make data visualization accessible to people who are blind or have low vision (BLV). This paper presents a sonification method for translating common data visualization into a blend of natural sounds. We hypothesize that people’s familiarity with sounds drawn from nature, such as birds singing in a forest, and their ability to listen to these sounds in parallel, will enable BLV users to perceive multiple data points being sonified at the same time. Informed by an extensive literature review and a preliminary study with 5 BLV participants, we designed an accessible data representation tool, Susurrus, that combines our sonification method with other accessibility features, such as keyboard interaction and text-to-speech feedback. Finally, we conducted a user study with 12 BLV participants and report the potential and application of natural sounds for sonification compared to existing sonification tools.https://doi.org/10.1145/3544548.358108